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Abstract 

With recent improvements of protocols for the assembly of transcriptional parts, synthetic biological devices can 
now more reliably be assembled according to a given design. The standardization of parts open up the way for 
in silico design tools that improve the construct and optimize devices with respect to given formal design 
specifications. The simplest such optimization is the selection of kinetic parameters and protein abundances such 
that the specified design constraints are robustly satisfied. In this work we address the problem of determining 
parameter values that fulfill specifications expressed in terms of a functional on the trajectories of a dynamical 
model. We solve this inverse problem by linearizing the forward operator that maps parameter sets to 
specifications, and then inverting it locally. This approach has two advantages over brute-force random sampling. 
First, the linearization approach allows us to map back intervals instead of points and second, every obtained value 
in the parameter region is satisfying the specifications by construction. The method is general and can hence be 
incorporated in a pipeline for the rational forward design of arbitrary devices in synthetic biology. 



Introduction 

Synthetic biology places emphasis on small, standardized 
molecular parts and devices, mostly operating at the tran- 
scriptional level [1,2]. With standardization comes the 
need for rigorous quantitative characterization of such 
devices and for a compositional theory to reliably build 
larger systems from small canonical circuits. For now 
most synthetic circuits implemented in vivo were con- 
structed from a small number of components with topol- 
ogy and parameter values found by trial-and-error. The 
development of larger synthetic systems necessitates the 
use of appropriate design methodologies. In silico analyses 
can provide significant insights into the construction of 
complex synthetic systems, but due to the poor quantifica- 
tion of experimental and micro- environmental conditions, 
the predictive capability of in silico models for in vivo 
implementations remains limited. Apart from experimen- 
tal limitations, modeling attempts to date most often make 
simplifying assumptions about all the perturbations that a 
synthetic construct is facing in vivo. For instance, only a 
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few studies account for the large extrinsic noise [3-5] and 
in particular the one introduced by variations of plasmid 
copy number [6]. Incorporating those realistic in vivo con- 
straints will make computational models more predictive, 
eventually enabling the upfront in silico optimization of 
transcriptional circuits. A first step toward this goal is to 
investigate the parameter dependency of certain behavior- 
ial properties of a circuits. In systems biology attempts 
have already been made to address this problem, however, 
they either rely on purely local measures [7,8] such as con- 
sidered in classical sensitivity analysis [9,10], or perform 
random parameter sampling [11] to determined parameter 
dependencies. 

For a given circuit topology, kinetic parameters and 
other parameters that are involved in controlling the 
expression level of molecular species (e.g. promoter activ- 
ity or number of ribosome binding sites) are important 
design parameters in synthetic biology. A major challenge 
is to find a set of parameters that satisfies the behavioral 
specification of a device [12]. Computer science offers var- 
ious languages to formally define the proper functioning 
of a piece of code or hardware. Such specification lan- 
guages of formal verification are used to check important 



O© 2013 Koeppl et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons 
BiolVlGCl Central Attribution License (http://creativecommons.0rg/licenses/by/2.O), which permits unrestricted use, distribution, and reproduction in 
any medium, provided the original work is properly cited. 



Koeppl et al. BMC Bioinformatics 2013, 14(Suppl 10):S9 
http://www.biomedcentral.eom/1 471 -21 05/1 4/S1 0/S9 



Page 2 of 7 



behavioral properties, such as liveness, safety or fairness 
[13]. One convenient way to specify such properties is to 
use temporal logic, which is considered an extension of 
classical propositional reasoning, where propositional vari- 
ables may change their truth values over time. A promi- 
nent such logic is the linear temporal logic (LTL), where 
the truth value of the propositions is interpreted over a 
linear timeline [13]. Such techniques were already applied 
to investigate robustness of computational models in sys- 
tem biology [14]. 

Mathematically, the design problem is an inverse pro- 
blem and hence inherits the general feature of such pro- 
blems, namely ill-posedness [15,16]. More specifically, for 
a certain behavioral specification one aims to find the cor- 
responding parameter set that gives rise to such behavior. 
An simple example for a quantity in feature space could 
be the concentration of a molecular species at particular 
time-points. The problem is closely related to parameter 
optimization and even more so to robust optimization, 
where an objective function - generally encoding some 
behavioral constraint (e.g. making model trajectories close 
to the measurements) - is optimized to yield the optimal 
parameter set. Ill-posedness refers to the observation that 
two close-by points in specification or behavioral feature 
space may map to very distant points in the parameter 
space, indicating that this mapping is generally not con- 
tractive but rather expansive. The inverse and correspond- 
ing forward problem is illustated in Figure 1. 

In the current analysis we restrict ourselves to models 
obeying the reaction rate equation and hence constitute a 
set of nonlinear ordinary differential equations. In general, 
connected domains may map to disconnected domains, 
for instance if the dynamical system contains bifurcation 
points (e.g. see Figure 1). For the proposed linearization 
approach we will further restrict ourselves to connected 
domains in the respective image space. Moreover, we will 
not resort to specifying behavior through temporal logics 
but will define general specification junctionals. These are 
mappings y/ from an appropriate function space % of 
^-dimensional trajectories (e.g. L 2 ([0, T], MJ 1 )) to the 
m-dimensional reals and we choose the form 



f 

Jo 



g(s,x(s))ds 



with x e X and the feature kernel g : M> 0 x M> 0 -> 
where T c M m . A special and more tractable version of 
the kernel is the convolution, i.e. g(t, x(t)) - h(T - t)x(t). In 
the following we will only require the map x — > g{-> x) to be 
once-differentiable. With this, we can define the forward 
map from a ^-dimensional parameter space to the feature 
space as the composition F = y/ o 0, with cp : MP -> X. 
The trajectories x e X are generated by the reaction rate 
equation 



dt 



x(t) = Nv{x{i), k) and x(0) = x 0 e M> 0 , 



(1) 



with the stoichiometric matrix JSf e Z n x ^ the reaction 
flux vector v : 
meter set. 



l> 0 and k e M p >0 the para- 



Methods 

The brute-force method of determining the parameter 
region that satisfies a certain behavioral specification 
ScJ usually proceeds by Monte Carlo sampling of para- 
meter sets, generating corresponding trajectories accord- 
ing to (1), checking whether those satisfy S and finally 
retaining only those parameter sets that led to satisfied 
specification S. There are two immediate downsides of this 
approach. First, most draws will be unsuccessful for high 
dimensional parameter spaces, for tight specifications, or 
for both. Different approaches using an optimized sam- 
pling [11,17] have been developed to mitigate this pro- 
blem, but are not solving it as they require convergence of 
the sampling. Second, drawing parameter points in M p 
does not provide guarantees that those points belong to a 
connected domain of consistent parameter sets. Here we 
provide first attempts to tackle both problems. 

The main idea is to locally linearize the forward map F 
around some point and then locally invert it. Hence, a 
small enough local patch in feature space can be mapped 
backward to a small patch in parameter space. By succes- 
sively sampling expansion points in their neighborhoods 
(e.g. by the ball-walk algorithm [18]) we can systemati- 
cally cover the entire specification S and obtain the corre- 
sponding parameter region. A series expansion of F 
around some initial parameter set k° reads 



F{k° + dfe) = F(fe°) + 



8F{k) 



dk 



k=k° 



dk + o[dk) 



Defining df=F (k° + dk) - F (k°) we see that a neigh- 
borhood df in feature space to first order can be 
mapped backward using the Moore-Penrose pseudo- 
inverse 

dk = iSdf, 
that we define with care as 



V = lim (L T L + XI)- 1 L T = lim L(LL T + Ai) -1 , 



(2) 



where L denotes the linearized forward map and 
hence is just the m x p matrix 



L = 



dF(k) 



dk 



(3) 



k=k° 



Note, that the limit in (2) exists even if the inverse of 
L T L and LL T do not exist. Such situations are 
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Figure 1 (A) The forward problem of defining a parameter set from which trajectories and their behavioral features are computed. (B) The 
inverse problem of finding a parameter regions for a predetermined behavioral specification region 5. Columns from left to right correspond to 
parameter space, trajectory space and behavioral feature space, respectively. Connected convex sets can map to nonconvex non-connected 
regions. 



encountered as soon as the number of specification fea- 
tures m are less than the number of parameters, i.e. the 
dimension p of the parameter space. Importantly, we 
can compute (3) efficiently using the variational equa- 
tion for the system (1). Observe that 



f 

Jo 



dg{s,x) 



dx 



dx(s, k) 



x=x(s,k) 



dk 



ds, 



k=k° 



where the last terms in the integral is just the sensitivity 
of the solution of (1) to perturbations in k around k°. 
According to the variational equation the sensitivity obeys 
the following ordinary n x p matrix differential equation 



d dx{t,k) 
dt dk 



dv{x, k) dx{t, k) dv{x, k) N 



dx 



dk 



with 



dx{0, k) 

dk 



(4) 



where we skipped the explicit dependency on k° for 
brevity. Note, that (4) is equivalent to the transient sen- 
sitivity analysis of metabolic networks [9,10], proposed 
as an extension of classical metabolic control analysis 
that only deals with steady state sensitivities. For a cer- 
tain I<° the sensitivity of the kernel g is a constant m x n 
matrix that can be computed explicitly. Thus, by jointly 
solving (1) and (4) for some k° together with 



dg{s,x) 



dx 



dx{t, k) 



x=x{t,k°) 



dk 



with L(0) = 0 



k=k° 



up to time T we obtain the linearized map L = L(T). 
Hence, for every sampled I<° and associated feature 
point f° we propose to design a feature ball 

B f o(8) = {feF\ ||/-/°|| 2 <5} 

and map it backward using Z, f . According to the sin- 
gular value decomposition V - LTLV ^with 2 a diagonal 
matrix with non-negative entries [16], the backward 
transformation needs to be a sequence of a rotation, a 
scaling and another rotation and hence the image of Bp 
under L + can only be a ellipsoid in the parameter space 

{LW e Bfo{&)} e RP. 

Clearly, sampling a multivariate region with balls of 
same dimension allow for a complete coverage of the 
region - something that can only be extrapolated when 
using pointwise sampling [11]. The question to efficiently 
sample a region with balls has been addressed in compu- 
tational geometry and efficient randomized algorithms 
are available [18]. 

We remark that the map L is not the best local 
approximation to F(k) in some norm sense. More speci- 
fically we can improve on L if we are giving additional 
samples of the neighborhood Bp (8). Consider we draw 
another k l e B^ then we can construct a rank-one 
update to L 
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V = L + 



AF - LAk 



||Afe|| 



-Ak 1 



(5) 



where AF = F{U) - F(k°) and Ak = K - k°. In particular, 
the rank-one term (5) captures the nonlinear part of F. 
From (5) it follows that the matrix fj satisfies the consis- 
tency property 



L^W -k°) = F(k i )-f°. 



(6) 



Thus, knowing how to construct rank-one updates over 
the domain of interest is equivalent to knowing F(k) 
locally. In fact, \j is the matrix closest to L, with respect to 
the Frobenius norm, that satisfies (6). Subsequently we 
will use this improved linear approximation to F to bound 
the error that one can incurrs if one uses the pseudoin- 
verse V for the backward map. This will also provide 
means to determine the maximal ball size 3 to stay below 
a certain error bound. We quantify the error in the feature 
space by the backward map followed by a forward map. 
That is, we want to find a 3 such that 



\\F(k° + L^{f-f))-f\\ 2 <s 



(7) 



for all/ €5/0(5). 

Now suppose we know a bound p(3) for the Frobenius 
norm of the rank-one perturbation, i.e. ||L — L\\p < p (8) 
in the local domain of interest. Note, that p(S) could and 
need to be estimated by sampling. Given a f e Bp (8) the 
maximal error of the inverse-forward map is 

max \\W(f-f 0 )-(f-f°)\\ 2 

L:\\L-L\\ P <p(8) 

which is known from robust linear squares [16] to be 
equivalent to the error 

| |LL+ (f -f) - (f -f) \\ 2+ p (8) ||L+ (f -f) || 2 . 

Assuming that L has linearly independent rows, LL f is 
the identity matrix and thereby the error simplifies to 

p{S)\\I? (f ! '-/°)l| 2 . 

This result provides one way to determine the radius 
of the feature ball 3 when relying on the pseudo-inverse 

max 8 

8 



subject to 

p(8)\\tf(f-f)\\ 2 <s 
\\f-f°\\2<8 



(8) 



Results 

As a proof of concept of our method, we applied it to a 
simple synthetic sensor construct [19]. The system is 



made of several gene copies (e.g. with plasmid transfec- 
tion), expressing a protein that dimerizes and activates 
the gene by binding to the promoter. In presence of the 
inhibitor (input of the system), the dimer is trapped and 
cannot bind to the promoter. A schematic of the 
involved reactions is depicted in Figure 2. 

The system is simulated according to mass-action and 
obeys 



~dt 

dx 2 
~dt 

dx, 

dt 

dx 4 
"dt" 
dxs 
~dt 



ki(x® — Xs) + k 2 Xs — k 3 X\ 

k^Xi - 2k 5 x\ + 2k 6 x 3 - knx 2 

k 5 x 2 2 - k 6 x 3 - k 7 x 3 y{t) + k 8 x 4 - h{x° 5 - x 5 )x 3 + k w x 5 - k u x 3 (9) 
k 7 x 3 y{t) - k 8 x 4 - k u x 4 
fe 9 (x§ - x 5 )x 3 - k 10 x 5 - k u x 5 . 



where the states x t denote the concentration of 
mRNA, protein, protein-dimer and dimer-promoter 
complex, respectively. The quantities x° s and y(t) refer 
the total number of promoters and the external inhibi- 
tor concentration, respectively. The nominal value and 
the meaning of the model parameters are summarized 
in Table 1. We remark that such continuous state-space 
model have their limitations for transcriptional circuits 
because they require several gene copies in order to 
neglect the discrete Boolean nature of a single gene. 

For the specified behavioral features, we expect the 
dimer to drop quickly after introduction of inhibitor 
and then quickly regain a high level after the inhibitor is 
washed out of the medium. We also constrain the 
monomeric protein. The specification functionals are 
the integral of the absolute difference to some target 
value x* (s) for the monomer and the dimer concentra- 
tion over two small time intervals for each. More speci- 
fically, 



>iWl = 
_f 2 {x)\ 



(10) 



/ wi(s)[x 2 {s,k) -X2(s)] 2 ds 
Jo 

/ ^2 (s) [^3 (s, k) — xl (s) ] 2 ds 
-Jo 

where w is the temporal weight function chosen to be 

f ° r / e u [t3 , t4 ] tei-htLi) 

iy J [0 otherwise 

The actual values for time-intervals for w x and w 2 , as 
well as the target values are shown together with the 
trajectories for the nominal system (9) in Figure 3. 

For this case study we assume that we have means to 
design the binding rate of the inhibitor to the dimer k 7 
and the binding rate of the dimer to the promoter /c 9 . 
To assess the error incurred by the linearization we con- 
sider the reverse-forward mapping as described in (7). 
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Figure 2 Simple transcriptional sensor construct. The dimerized form (A 2 ) of a protein (A) is its own posit 
the dimer away in an inactive form {A 2 - I). 

V 



Hence for various size of 3 we perform the inverse map- 
ping with V and the forward mapping with F. If the 
inverse map is exact we should obviously obtain a ball 
with the same 3. Any deviation s thereof reflects the 
approximation of F~l by V . In Figure 4 the images of 
Bfo{8) under L f and F ° L f are shown for various radii 3. 

Hence, for an intermediate size of 3 a good trade-off 
between approximation accuracy and sampling coverage 
is achievable. A systematic sampling of a predetermined 
specification area S would proceed by successively sam- 
pling overlapping balls with radii adapted to maintain £ 
under a certain value as illustrated in Figure 5. In this 
example, the coverage of the region S is above 98% 
using 50 balls of different radii. The lower left corner of 
the specification space (Figure 5A) maps to a strongly 
nonlinear region of the parameter space (upper right 



corner in Figure 5B) and therefore forces the use of 
smaller balls to keep the error in acceptable range. On 
the contrary, the upper right region of the specification 
space is more linear and larger balls can be used with 
limited relative error (Figure 5C). 

Conclusion 

We presented a novel method to determine the parameter 
region of a biochemical reaction network that is consistent 
with a certain dynamical, behavioral specification. We 
defined specifications in a novel and general way that 
requires only the specification map to be once differenti- 
ate with respect to the states of the underlying differential 
equations. We showed that by locally linearizing this map 
we can solve the desired inverse problem of finding a para- 
meter region for a given specification. As regions, instead 



Table 1 Nominal values and meaning of the kinetic 
parameters for the model of the synthetic sensor 
construct. 



Basal transcription rate 


*1 


0.02 sec" 1 


Active-promoter transcription rate 


k 2 


0.4 sec" 1 


mRNA degradation rate 


k 3 


0.3 sec" 1 


Protein translation rate 


k 4 


3 (nMsec) " 1 


Dimerization rate 


k 5 


0.1 (nMsec) " 1 


Dimer dissociation rate 


k 6 


0.001 sec" 1 


Inhibitor binding rate 


k 7 


0.011 (nMsec)" 1 


Inhibitor unbinding rate 


ks 


0.2 sec" 1 


Dimer-promoter binding rate 


k 9 


0.21 (nMsec)" 1 


Dimer-promoter unbinding rate 


km 


0.2 sec" 1 


Protein degradation rate 


ku 


0.2 sec" 1 



Values are based on [19] and slightly adapted to obtain a desired threshold 
behavior. 




100 150 200 
time [min] 

Figure 3 Time courses of monomer (A, x 2 ) and dimer (A 2 , x 3 ) 
concentration of (9) for an addition and removal of the inhibitor (/, 
y); the target values and time intervals chosen for the specification 
functionals are indicated by solid black lines. 
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Figure 4 Contours of Bfo{8) (blue) in feature space (first row) are mapped back to the parameter space via /. + (second row) and mapped 
forward using F (red) for increasing size of S (from left to right). 



of points, are mapped back to parameter space the scheme 
is in principle able to cover (given some regularity condi- 
tions) the feature and parameter space - something that is 
not possible with point-wise sampling. We also discuss 
means for estimating the size of the local neighborhood in 
order to guarantee certain approximation errors. The 
computational framework allows a very flexible definition 
of biologically relevant behavorial features and efficient 
determination of the corresponding parameter region. 
Hence, the range of experimentally modifiable parameters, 



such as promoter binding strength can be determined 
upfront before the experimental synthesis of a synthetic 
construct. 

Throughout this work we only considered models 
based on ordinary differential equations, but the outlined 
framework can be extended to include stochastic dyna- 
mical models through the use of moment closure meth- 
ods, for instance. In general, the specification functional 
will then involve the expectation operator and Monte 
Carlo sampling may be required to approximate it. 
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Methods from stochastic sensitivity analysis [20] can be 
applied in order to perform the local inversion. 
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